A Beam Search Algorithm for ITG Word Alignment
نویسندگان
چکیده
Inversion transduction grammar (ITG) provides a syntactically motivated solution to modeling the distortion of words between two languages. Although the Viterbi ITG alignments can be found in polynomial time using a bilingual parsing algorithm, the computational complexity is still too high to handle real-world data, especially for long sentences. Alternatively, we propose a simple and effective beam search algorithm. The algorithm starts with an empty alignment and keeps adding single promising links as early as possible until the model probability does not increase. Experiments on Chinese-English data show that our algorithm is one order of magnitude faster than the bilingual parsing algorithm with bitext cell pruning without loss in alignment and translation quality. TITLE AND ABSTRACT IN ANOTHER LANGUAGE, CHINESE «ITGcéà Î|¢{ =1{ü«óm cNSJø «é{°Ä )ûY",Vé{© Û{±3õamS|¢ Viterbiéà§ÙOE,Ý, p§J±?n1é õéf ý¢êâ"d§·JÑ«{ük Î|¢{"T{±éàå :§`kÀJÐ ëV\ é१Ã{Jp .VÇ"3Ç=êâþ ¢ (JL2§· {'¦^}{Eâ V©Û{ ̄êþ?§Ó ± éà ÚÈ þ"
منابع مشابه
A Comparison of Syntactically Motivated Word Alignment Spaces
This work is concerned with the space of alignments searched by word alignment systems. We focus on situations where word re-ordering is limited by syntax. We present two new alignment spaces that limit an ITG according to a given dependency parse. We provide D-ITG grammars to search these spaces completely and without redundancy. We conduct a careful comparison of five alignment spaces, and sh...
متن کاملFeature-Based ITG for Unsupervised Word Alignment
3 Department of Computer Science, School of Computing, National University of Singapore Abstract. Inversion transduction grammar (ITG) [1] is an effective constraint to word alignment search space. However, the traditional unsupervised ITG word alignment model is incapable of utilizing rich features. In this paper, we propose a novel feature-based unsupervised ITG word alignment model. With the...
متن کاملA Comparative Study on Reordering Constraints in Statistical Machine Translation
In statistical machine translation, the generation of a translation hypothesis is computationally expensive. If arbitrary wordreorderings are permitted, the search problem is NP-hard. On the other hand, if we restrict the possible word-reorderings in an appropriate way, we obtain a polynomial-time search algorithm. In this paper, we compare two different reordering constraints, namely the ITG c...
متن کاملDealing with Spurious Ambiguity in Learning ITG-based Word Alignment
Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in bot...
متن کاملImproved Discriminative ITG Alignment using Hierarchical Phrase Pairs and Semi-supervised Training
While ITG has many desirable properties for word alignment, it still suffers from the limitation of one-to-one matching. While existing approaches relax this limitation using phrase pairs, we propose a ITG formalism, which even handles units of non-contiguous words, using both simple and hierarchical phrase pairs. We also propose a parameter estimation method, which combines the merits of both ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012